<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
<link href="../style/ebook.css" type="text/css" rel="stylesheet">
</head>
<body>
<h1>FAQ</h1>
<div class="section" id="why-isn-t-my-task-getting-scheduled">
<h2 class="sigil_not_in_toc">Why isn&#x2019;t my task getting scheduled?</h2>
<p>There are very many reasons why your task might not be getting scheduled.
Here are some of the common causes:</p>
<ul class="simple">
<li>Does your script &#x201C;compile&#x201D;, can the Airflow engine parse it and find your
DAG object. To test this, you can run <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">list_dags</span></code> and
confirm that your DAG shows up in the list. You can also run
<code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">list_tasks</span> <span class="pre">foo_dag_id</span> <span class="pre">--tree</span></code> and confirm that your task
shows up in the list as expected. If you use the CeleryExecutor, you
may want to confirm that this works both where the scheduler runs as well
as where the worker runs.</li>
<li>Does the file containing your DAG contain the string &#x201C;airflow&#x201D; and &#x201C;DAG&#x201D; somewhere
in the contents? When searching the DAG directory, Airflow ignores files not containing
&#x201C;airflow&#x201D; and &#x201C;DAG&#x201D; in order to prevent the DagBag parsing from importing all python
files collocated with user&#x2019;s DAGs.</li>
<li>Is your <code class="docutils literal notranslate"><span class="pre">start_date</span></code> set properly? The Airflow scheduler triggers the
task soon after the <code class="docutils literal notranslate"><span class="pre">start_date</span> <span class="pre">+</span> <span class="pre">scheduler_interval</span></code> is passed.</li>
<li>Is your <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> set properly? The default <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>
is one day (<code class="docutils literal notranslate"><span class="pre">datetime.timedelta(1)</span></code>). You must specify a different <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>
directly to the DAG object you instantiate, not as a <code class="docutils literal notranslate"><span class="pre">default_param</span></code>, as task instances
do not override their parent DAG&#x2019;s <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>.</li>
<li>Is your <code class="docutils literal notranslate"><span class="pre">start_date</span></code> beyond where you can see it in the UI? If you
set your <code class="docutils literal notranslate"><span class="pre">start_date</span></code> to some time say 3 months ago, you won&#x2019;t be able to see
it in the main view in the UI, but you should be able to see it in the
<code class="docutils literal notranslate"><span class="pre">Menu</span> <span class="pre">-&gt;</span> <span class="pre">Browse</span> <span class="pre">-&gt;Task</span> <span class="pre">Instances</span></code>.</li>
<li>Are the dependencies for the task met. The task instances directly
upstream from the task need to be in a <code class="docutils literal notranslate"><span class="pre">success</span></code> state. Also,
if you have set <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code>, the previous task instance
needs to have succeeded (except if it is the first run for that task).
Also, if <code class="docutils literal notranslate"><span class="pre">wait_for_downstream=True</span></code>, make sure you understand
what it means.
You can view how these properties are set from the <code class="docutils literal notranslate"><span class="pre">Task</span> <span class="pre">Instance</span> <span class="pre">Details</span></code>
page for your task.</li>
<li>Are the DagRuns you need created and active? A DagRun represents a specific
execution of an entire DAG and has a state (running, success, failed, &#x2026;).
The scheduler creates new DagRun as it moves forward, but never goes back
in time to create new ones. The scheduler only evaluates <code class="docutils literal notranslate"><span class="pre">running</span></code> DagRuns
to see what task instances it can trigger. Note that clearing tasks
instances (from the UI or CLI) does set the state of a DagRun back to
running. You can bulk view the list of DagRuns and alter states by clicking
on the schedule tag for a DAG.</li>
<li>Is the <code class="docutils literal notranslate"><span class="pre">concurrency</span></code> parameter of your DAG reached? <code class="docutils literal notranslate"><span class="pre">concurrency</span></code> defines
how many <code class="docutils literal notranslate"><span class="pre">running</span></code> task instances a DAG is allowed to have, beyond which
point things get queued.</li>
<li>Is the <code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code> parameter of your DAG reached? <code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code> defines
how many <code class="docutils literal notranslate"><span class="pre">running</span></code> concurrent instances of a DAG there are allowed to be.</li>
</ul>
<p>You may also want to read the Scheduler section of the docs and make
sure you fully understand how it proceeds.</p>
</div>
<div class="section" id="how-do-i-trigger-tasks-based-on-another-task-s-failure">
<h2 class="sigil_not_in_toc">How do I trigger tasks based on another task&#x2019;s failure?</h2>
<p>Check out the <code class="docutils literal notranslate"><span class="pre">Trigger</span> <span class="pre">Rule</span></code> section in the Concepts section of the
documentation</p>
</div>
<div class="section" id="why-are-connection-passwords-still-not-encrypted-in-the-metadata-db-after-i-installed-airflow-crypto">
<h2 class="sigil_not_in_toc">Why are connection passwords still not encrypted in the metadata db after I installed airflow[crypto]?</h2>
<p>Check out the <code class="docutils literal notranslate"><span class="pre">Connections</span></code> section in the Configuration section of the
documentation</p>
</div>
<div class="section" id="what-s-the-deal-with-start-date">
<h2 class="sigil_not_in_toc">What&#x2019;s the deal with <code class="docutils literal notranslate"><span class="pre">start_date</span></code>?</h2>
<p><code class="docutils literal notranslate"><span class="pre">start_date</span></code> is partly legacy from the pre-DagRun era, but it is still
relevant in many ways. When creating a new DAG, you probably want to set
a global <code class="docutils literal notranslate"><span class="pre">start_date</span></code> for your tasks using <code class="docutils literal notranslate"><span class="pre">default_args</span></code>. The first
DagRun to be created will be based on the <code class="docutils literal notranslate"><span class="pre">min(start_date)</span></code> for all your
task. From that point on, the scheduler creates new DagRuns based on
your <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> and the corresponding task instances run as your
dependencies are met. When introducing new tasks to your DAG, you need to
pay special attention to <code class="docutils literal notranslate"><span class="pre">start_date</span></code>, and may want to reactivate
inactive DagRuns to get the new task onboarded properly.</p>
<p>We recommend against using dynamic values as <code class="docutils literal notranslate"><span class="pre">start_date</span></code>, especially
<code class="docutils literal notranslate"><span class="pre">datetime.now()</span></code> as it can be quite confusing. The task is triggered
once the period closes, and in theory an <code class="docutils literal notranslate"><span class="pre">@hourly</span></code> DAG would never get to
an hour after now as <code class="docutils literal notranslate"><span class="pre">now()</span></code> moves along.</p>
<p>Previously we also recommended using rounded <code class="docutils literal notranslate"><span class="pre">start_date</span></code> in relation to your
<code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>. This meant an <code class="docutils literal notranslate"><span class="pre">@hourly</span></code> would be at <code class="docutils literal notranslate"><span class="pre">00:00</span></code>
minutes:seconds, a <code class="docutils literal notranslate"><span class="pre">@daily</span></code> job at midnight, a <code class="docutils literal notranslate"><span class="pre">@monthly</span></code> job on the
first of the month. This is no longer required. Airflow will now auto align
the <code class="docutils literal notranslate"><span class="pre">start_date</span></code> and the <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code>, by using the <code class="docutils literal notranslate"><span class="pre">start_date</span></code>
as the moment to start looking.</p>
<p>You can use any sensor or a <code class="docutils literal notranslate"><span class="pre">TimeDeltaSensor</span></code> to delay
the execution of tasks within the schedule interval.
While <code class="docutils literal notranslate"><span class="pre">schedule_interval</span></code> does allow specifying a <code class="docutils literal notranslate"><span class="pre">datetime.timedelta</span></code>
object, we recommend using the macros or cron expressions instead, as
it enforces this idea of rounded schedules.</p>
<p>When using <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code> it&#x2019;s important to pay special attention
to <code class="docutils literal notranslate"><span class="pre">start_date</span></code> as the past dependency is not enforced only on the specific
schedule of the <code class="docutils literal notranslate"><span class="pre">start_date</span></code> specified for the task. It&#x2019;s also
important to watch DagRun activity status in time when introducing
new <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code>, unless you are planning on running a backfill
for the new task(s).</p>
<p>Also important to note is that the tasks <code class="docutils literal notranslate"><span class="pre">start_date</span></code>, in the context of a
backfill CLI command, get overridden by the backfill&#x2019;s command <code class="docutils literal notranslate"><span class="pre">start_date</span></code>.
This allows for a backfill on tasks that have <code class="docutils literal notranslate"><span class="pre">depends_on_past=True</span></code> to
actually start, if that wasn&#x2019;t the case, the backfill just wouldn&#x2019;t start.</p>
</div>
<div class="section" id="how-can-i-create-dags-dynamically">
<h2 class="sigil_not_in_toc">How can I create DAGs dynamically?</h2>
<p>Airflow looks in your <code class="docutils literal notranslate"><span class="pre">DAGS_FOLDER</span></code> for modules that contain <code class="docutils literal notranslate"><span class="pre">DAG</span></code> objects
in their global namespace, and adds the objects it finds in the
<code class="docutils literal notranslate"><span class="pre">DagBag</span></code>. Knowing this all we need is a way to dynamically assign
variable in the global namespace, which is easily done in python using the
<code class="docutils literal notranslate"><span class="pre">globals()</span></code> function for the standard library which behaves like a
simple dictionary.</p>
<div class="code python highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
    <span class="n">dag_id</span> <span class="o">=</span> <span class="s1">&apos;foo_</span><span class="si">{}</span><span class="s1">&apos;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
    <span class="nb">globals</span><span class="p">()[</span><span class="n">dag_id</span><span class="p">]</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="p">)</span>
    <span class="c1"># or better, call a function that returns a DAG object!</span>
</pre>
</div>
</div>
</div>
<div class="section" id="what-are-all-the-airflow-run-commands-in-my-process-list">
<h2 class="sigil_not_in_toc">What are all the <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span></code> commands in my process list?</h2>
<p>There are many layers of <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span></code> commands, meaning it can call itself.</p>
<ul class="simple">
<li>Basic <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span></code>: fires up an executor, and tell it to run an
<code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span> <span class="pre">--local</span></code> command. if using Celery, this means it puts a
command in the queue for it to run remote, on the worker. If using
LocalExecutor, that translates into running it in a subprocess pool.</li>
<li>Local <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span> <span class="pre">--local</span></code>: starts an <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span> <span class="pre">--raw</span></code>
command (described below) as a subprocess and is in charge of
emitting heartbeats, listening for external kill signals
and ensures some cleanup takes place if the subprocess fails</li>
<li>Raw <code class="docutils literal notranslate"><span class="pre">airflow</span> <span class="pre">run</span> <span class="pre">--raw</span></code> runs the actual operator&#x2019;s execute method and
performs the actual work</li>
</ul>
</div>
<div class="section" id="how-can-my-airflow-dag-run-faster">
<h2 class="sigil_not_in_toc">How can my airflow dag run faster?</h2>
<p>There are three variables we could control to improve airflow dag performance:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">parallelism</span></code>: This variable controls the number of task instances that the airflow worker can run simultaneously. User could increase the parallelism variable in the <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</li>
<li><code class="docutils literal notranslate"><span class="pre">concurrency</span></code>: The Airflow scheduler will run no more than <code class="docutils literal notranslate"><span class="pre">$concurrency</span></code> task instances for your DAG at any given time. Concurrency is defined in your Airflow DAG. If you do not set the concurrency on your DAG, the scheduler will use the default value from the <code class="docutils literal notranslate"><span class="pre">dag_concurrency</span></code> entry in your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</li>
<li><code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code>: the Airflow scheduler will run no more than <code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code> DagRuns of your DAG at a given time. If you do not set the <code class="docutils literal notranslate"><span class="pre">max_active_runs</span></code> in your DAG, the scheduler will use the default value from the <code class="docutils literal notranslate"><span class="pre">max_active_runs_per_dag</span></code> entry in your <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code>.</li>
</ul>
</div>
<div class="section" id="how-can-we-reduce-the-airflow-ui-page-load-time">
<h2 class="sigil_not_in_toc">How can we reduce the airflow UI page load time?</h2>
<p>If your dag takes long time to load, you could reduce the value of <code class="docutils literal notranslate"><span class="pre">default_dag_run_display_number</span></code> configuration in <code class="docutils literal notranslate"><span class="pre">airflow.cfg</span></code> to a smaller value. This configurable controls the number of dag run to show in UI with default value 25.</p>
</div>
<div class="section" id="how-to-fix-exception-global-variable-explicit-defaults-for-timestamp-needs-to-be-on-1">
<h2 class="sigil_not_in_toc">How to fix Exception: Global variable explicit_defaults_for_timestamp needs to be on (1)?</h2>
<p>This means <code class="docutils literal notranslate"><span class="pre">explicit_defaults_for_timestamp</span></code> is disabled in your mysql server and you need to enable it by:</p>
<ol class="arabic simple">
<li>Set <code class="docutils literal notranslate"><span class="pre">explicit_defaults_for_timestamp</span> <span class="pre">=</span> <span class="pre">1</span></code> under the mysqld section in your my.cnf file.</li>
<li>Restart the Mysql server.</li>
</ol>
</div>
<div class="section" id="how-to-reduce-airflow-dag-scheduling-latency-in-production">
<h2 class="sigil_not_in_toc">How to reduce airflow dag scheduling latency in production?</h2>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">max_threads</span></code>: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by <code class="docutils literal notranslate"><span class="pre">max_threads</span></code> with default value of 2. User should increase this value to a larger value(e.g numbers of cpus where scheduler runs - 1) in production.</li>
<li><code class="docutils literal notranslate"><span class="pre">scheduler_heartbeat_sec</span></code>: User should consider to increase <code class="docutils literal notranslate"><span class="pre">scheduler_heartbeat_sec</span></code> config to a higher value(e.g 60 secs) which controls how frequent the airflow scheduler gets the heartbeat and updates the job&#x2019;s entry in database.</li>
</ul>
</div>
</body>
</html>